Metropolis(-Hastings) random walk

\[ \begin{aligned} T_t(\theta^t|\theta^{t-1}) = \frac{\frac{p(\theta^*|y)}{J(\theta^*|\theta^{t-1})}}{\frac{p(\theta^{t-1}|y)}{J(\theta^{t-1}|\theta^*)}} \end{aligned} \]

Relation between jumping rule and convergence

\(J(\cdot)\) is only useful jumping distibution, if:

(see Gelman et al. (2013))

But how can we know that

Problem

Feng (2016)

Idea

\(\rightarrow\) the resulting algorithm is somewhat a hybrid Monte Carlo with a mix of the known random walk and deterministic simulation methods derived from hamiltonian dynamics

Ingredients

\[ \frac{dlog_e(p(\theta|y))}{d\theta} \] (Note: use analytic solutions here if possible, numerical might make the computational benefit vanish)

Hamiltonian Monte Carlo

In 3 steps:

  1. update \(\phi\), dawn from posterior \(p(\phi)\)
  2. simultaneously updating of \((\theta,\phi)\) via leapfrog steps
  3. acceptance/ rejection step analog to MH-Algorithm

Intuition of Leapfrogsteps

  • Think of 1/2/3D density curvature
  • preserving the posterior for \(\lim_{e \to 0} p(\theta,\phi|y)\)
  • Suppose HMC moves towards low posterior density regions:

\[ sgn\Big(\frac{dlog_e(p(\theta|y))}{d\theta}\Big) = -1 \ \Rightarrow \ \phi\ decreases \]

  • Suppose HMC moves towards a local posterior maximum:

\[ \frac{dlog_e(p(\theta|y))}{d\theta} = 0 \ \Rightarrow \ p(\theta|y)\ slows \ by\ moving\ closer\ to\ local\ max \]

\(\rightarrow\) HMC works like a mode-finder

Phylical analogy

Code

## function(position_init, stepsize, leapfrogsteps, iterations) {
##   position <- position_init
##   for(iter in seq_len(iterations)) {
##     momentum <- rnorm(length(position_init))
##     gradient
##     proposal <- list("position" = position, "momentum" = momentum)
##     for(step in seq_len(leapfrogsteps)){
##       gradient <- gradient(proposal$position)
##       proposal <- do.call(lepfrog, c(proposal, stepsize, gradient))
##     }
##     acceptance <- joint_probability(proposal$position, proposal$momentum) / joint_probability(position, momentum)
##     acceptance <- min(1, acceptance)
##     position <- sample(c(proposal$position, position), prob = c(acceptance, 1 - acceptance))
##   }
## }
## <environment: namespace:NoUTurn>

Problem

Feng (2016)

  • taking too few or too large steps \(\rightarrow \ \theta^t\ \& \ \theta^{t+1}\) end up very close.

Setting the tuning parameters

(Hand-)Tuning Parameters:

\(\epsilon = stepsize\)
\(\mathcal{L} = \#Leapfrogsteps\)
\(\phi = momentum\)

There are several heuristics and strategies to set the tuning parameters:

  • Radius:

    stay in the radius of you target distribution. Rule of thumb: \(\epsilon \mathcal{L} = 1\)

  • adaptive updating:

    1. run \(M_{init}\) steps with initial setting

    2. adjust the parameters based on knowledge from previous run and rerun the model for \(M_{adjust}\) steps

  • Acceptance Sweet Spot: 65% acceptance rate (\(\alpha\))

    if \(\alpha < 0.65 \Rightarrow\) leapfrog jumps are too ambitious. Set \(\mathcal{L}\uparrow, \epsilon \downarrow\).

    if \(\alpha > 0.65 \Rightarrow\) leapfrog jumps are too cautious. Set \(\mathcal{L} \downarrow, \epsilon \uparrow\).

Desirable for HMC: #TODO: Proof that later on!

  1. \(\epsilon\) getting smaller in areas of high curvature exploiting various areas.

  2. \(\mathcal{L}\) driving the trajectory of steps in one iteration though the whole posterior space.

  3. \(Covar(\phi)\) scaling to the local curvature.

(see Gelman et al. (2013))

@@ 1,3 @@
@@ 1,4 @@
 
function(x) {
 
function(x) {
 
x <- x + x
 
x <- x + x
~
>
x
 
}
 
}

References

Feng, Chi. 2016. “MCMC Demo.” GitHub Repository. https://github.com/chi-feng/mcmc-demo; GitHub.

Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian Data Analysis. CRC press.